Optimization Algorithms and Equilibrium Analysis for Dynamic Resource Allocation
نویسنده
چکیده
We consider optimization and equilibrium models and algorithms for dynamic resource allocation. The most important accomplishment of the project would be the paper: “The Simplex and Policy-Iteration Methods are Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate,” where I proved that the classic policy-iteration method (Howard 1960), including the Simplex method (Dantzig 1947) with the most-negative-reduced-cost pivoting rule, is a strongly polynomial-time exact algorithm for solving the Markov decision problem (MDP) exactly with any fixed discount factor. Markov decision process (e.g., Shapley, 1953) is arguably one of the most widely used decision models/methodologies in practice, as a celebrated example to showcase the power of optimization to help making sensible decisions in a complex system and stochastic environment. And the two methods are the most used methods in real world applications but their theoretical complexities were open before our result. We also explore an online and dynamic resource allocation and mechanism design in a set of research publications, including prediction market, Internet auction, spectrum allocation/trading model, sensor network localization, and etc, where demands for resources arrive sequentially and a decision/trade has to make as soon as a (pair) demand order(s) arrives. We develop online algorithms/decision rules, similar to online routing and online combinatorial auctions, for general dynamic resource to achieve near-optimal social utility value and/or resource utilization. We outline major accomplishments that have been made from the project. Markov Decision Process [1] Ye, “The Simplex and Policy-Iteration Methods are Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate,” Mathematics of Operations Research, 36:4 (2011) 593-603. Markov decision process (e.g., Shapley, 1953) is arguably one of the most widely used dynamic decision models/methodologies in practice. It has been an integrated part of virtually any textbook on Operations Research, as a celebrated example to showcase the power of optimization to help making sensible decisions in a complex system and stochastic environment. Although many heuristic and linear programming methods have been proposed and well studied, it had been a long standing open problem to find a strongly polynomial time algorithm (independent of the problem data) for solving the Markov decision problem (MDP). Ye (Ye, “A new complexity result on solving the Markov decision problem,” Mathematics of Operations Research, 30:3 (2005) 733-749) resolved this open problem partially first in There, he developed a novel combinatorial interior point algorithm, and proved a strongly polynomial-time bound for solving the MDP problem exactly when the discount factor is fixed. This was the first strongly polynomial-time algorithm for MDP even with fixed discount factors. The only previously known result is a strongly polynomial-time algorithm (Papadimitriou, C. H., J. N. Tsitsiklis, 1987) for the deterministic MDP (that is reduced to a minimum-cycle network flow problem), which is based on Karp’s minimum-cycle network flow algorithm. More impressively, very recently Ye, in [1], proved that the classic policy-iteration method (Howard 1960), including the Simplex method (Dantzig 1947) with the mostnegative-reduced-cost pivoting rule, is also a strongly polynomial-time exact algorithm for solving the Markov decision problem (MDP) exactly with any fixed discount factor. Furthermore, the computational complexity of the policy-iteration method (including the Simplex method) is better than that of the interior-point algorithm, which matches its superior practical performance. The result is surprising because the simplex method with the same pivoting rule was shown to be exponential for solving a general linear programming problem (Klee and Minty, 1972), the simplex method with the smallest index pivoting rule was shown to be exponential for solving an MDP regardless of discount rates (Melekopoglou and Condon, 1994), and the policy-iteration method was recently shown to be exponential for solving undiscounted MDPs under the average cost criterion. This is an amazing result, given the fact that these methods exist for over 50 to 60 years, were studied extensively by many excellent researchers, and were popularly used in real-world applications. In addition, Ye’s analyses were adapted by a group computer scientists (Hansen, Miltersen and Zwick, 2011) to show that the policy or strategy iteration method is strongly polynomial for 2-player turn-based stochastic games with discounted zero-sum rewards. This provides the first strongly polynomial algorithm for solving these games, resolving a long standing open problem. Online Optimization and Mechanism Design [2] Agrawal, Delage, Peters, Wang, and Ye, “A Unified Framework for Dynamic Prediction Market Design,” Operations Research, 59:3 (2011) 550-568; [3] Agrawal, Ding, Sebari, and Ye, “Price of Correlations in Stochastic Optimization” to appear in Operations Research. Recently, coinciding with and perhaps driving the increased popularity of prediction markets, several novel pari-mutuel mechanisms have been developed such as the logarithmic market-scoring rule (LMSR), the cost-function formulation of market makers, utility-based markets, and the sequential convex pari-mutuel mechanism (SCPM). In [2], we present a convex optimization framework that unifies these seemingly unrelated models for centrally organizing contingent claims markets. The existing mechanisms can be expressed in our unified framework by varying the choice of a concave value function. We show that this framework is equivalent to a convex risk minimization model for the market maker. This facilitates a better understanding of the risk attitudes adopted by various mechanisms. The unified framework also leads to easy implementation because we can now find the cost function of a market maker in polynomial time by solving a simple convex optimization problem. In addition to unifying and explaining the existing mechanisms, we use the generalized framework to derive necessary and sufficient conditions for many desirable properties of a prediction market mechanism such as proper scoring, truthful bidding (in a myopic sense), efficient computation, controllable risk measure, and guarantees on the worst-case loss. As a result, we develop the first proper, truthful, risk-controlled, loss-bounded (independent of the number of states) mechanism; none of the previously proposed mechanisms possessed all these properties simultaneously. Thus, our work provides an effective tool for designing new prediction market mechanisms. We also discuss possible applications of our framework to dynamic resource pricing and allocation in general trading markets. We believe that our framework of [2] for designing dynamic prediction markets has intimate connections to other dynamic trading markets such as online auction of goods, and could lead to interesting results for these markets as well. In general, any dynamic resource allocation and pricing scheme relies crucially on the trade-off between the profit achieved by exploiting the resource now versus the value of saving the goods for the future and exploring the market further. This future value of resources is captured in our framework by the concave value function. Our risk-based formulation also formalized how this value function captures the trade-off between learning the preferences of the traders versus maximizing instant profit via a penalty function. This bears similarities to the classic exploration versus exploitation trade-off for general trading markets. Additionally, our mechanism achieves incentive compatibility using the VCG allocation and pricing scheme popular for online auctions of goods. Further investigation of implications of our results on other trading and auction markets is part of an ongoing research. When decisions are made in presence of high dimensional stochastic data, handling joint distribution of correlated random variables can present a formidable task, both in terms of sampling and estimation as well as algorithmic complexity. A common heuristic is to estimate only marginal distributions and substitute joint distribution by independent (product) distribution. In [3], we study possible loss incurred on ignoring correlations through a distributionally-robust stochastic programming model, and quantify that loss as Price of Correlations (POC). Using techniques of cost-sharing from game theory, we identify a wide class of problems for which POC has a small upper bound. To our interest, this class will include many stochastic convex programs, uncapacitated facility location, Steiner tree, and submodular functions, suggesting that the intuitive approach of assuming independent distribution approximates the robust model for these stochastic optimization problems. Additionally, we demonstrate hardness of bounding POC via examples of subadditive and supermodular functions that have large POC. We find that our results are also useful for solving many deterministic optimization problems like welfare maximization, k-dimensional matching and transportation problem, under certain conditions. In summary, [3] proposed an approximation algorithm to solve the DRSP model that simply ignores the correlations and can be implemented efficiently, and introduced a new concept of POC to measure the approximation ratio achieved. We believe the concept of POC is especially attractive because it characterizes the cases when the seemingly pessimistic worst case joint distribution is close to the more natural independent distribution, in the sense that former can be substituted by the latter. By proving upper an lower bounds on POC for a wide range of problems, our research sheds important insights on when correlations can be ignored in practice. We also show that many deterministic optimization problems that involve matching or partitioning constraints can be formulated as the problem of computing worst case distribution with given marginals. Hence, our results provide approximation algorithms for those as well. Finally, our methodology of bounding POC using cost-sharing schemes is a novel application of these algorithmic game theory techniques and deserves further study. Computational Game Theory and Market Equilibrium [4] Zhu, Dang and Ye, “A FPTAS for Computing a Symmetric Leontief Competitive Economy Equilibrium,” Math Programming 131 (2012) 113–129. [5] Dang, Zhu and Ye, “An interior-point path-following algorithm for computing a Leontief economy equilibrium,” Computational Optimization and Applications 50:2 (2011). [6] Yao, Armbruster, and Ye “Dynamic Spectrum Management with the Competitive Market Model,” IEEE Tran on Signal Processing 58:4 (2010) 2442-2446. The Arrow-Debreu competitive market equilibrium problem which was first formulated by Leon Walras in 1874. In this problem everyone in a population of n players has an initial endowment of a divisible good and a utility function for consuming all goods--their own and others. Every player produce/sells the entire initial endowment and then uses the revenue to buy a bundle of goods such that his or her utility function is maximized. Walras asked whether prices could be set for everyone's good such that this is possible. An answer was given by Arrow and Debreu (1954) who showed that such equilibrium would exist if the utility functions were concave. Their proof was nonconstructive and did not offer any algorithm to find such equilibrium prices. Fisher (1891) was the first to consider an algorithm to compute equilibrium prices for a special case model where players are divided into two sets: producers and consumers. Scarf in 1973 also developed an algorithm to solve general fixed point problems, including the competitive market equilibrium problem. His algorithm, however, was not proved to be polynomial-time. If the utility functions are linear, Eisenberg and Gale (1959) gave a nonlinear convex optimization setting to formulate the Fisher model and Nenakhov and Primak (1983, also Jain 2004) gave a nonlinear convex optimization setting to formulate the general ArrowDebreu model. Thus, the Ellipsoid method can be used to solve them approximately in polynomial time where the bound is of order O(nL). Recenty, Ye developed an interiorpoint algorithm that solves both the Fisher and Arrow-Debreu models exactly, when the utilities are linear, in polynomial time of order O(nL), which is in line with the best complexity bound for linear programming of the same dimension and size. These results motivated many researchers in the past few years look for polynomial-time algorithms for solving more general utility equilibrium problems, such as the Leontief utility, which is a piece-wise linear concave function. However, soon after Ye and his co-authors announced that the computation of the Arrow-Debreu equilibrium with the Leontief utility is NP-hard, which effectively terminates any hope to develop a polynomial-time algorithm unless P=NP. On the positive sides, in [4], we consider a linear complementarity problem (LCP) arisen from the Nash and Arrow–Debreu with Leontief utility competitive economy equilibria where the LCP coefficient matrix is symmetric. We prove that the decision problem, to decide whether or not there exists a complementary solution, is NP-complete. Under certain conditions, an LCP solution is guaranteed to exist and we present a fully polynomialtime approximation scheme (FPTAS) for approximating a complementary solution, although the LCP solution set can be non-convex or non-connected. Our method is based on approximating a quadratic social utility optimization problem (QP) and showing that a certain KKT point of the QP problem is an LCP solution. Then, we further show that such a KKT point can be approximated with a new improved running time complexity. We also report computational results which show that the method is highly effective. Applications in competitive market model problems with other utility functions are also presented, including global trading and dynamic spectrum management problems. Then, in [5], we present an interior-point path-following algorithm for computing a nonsymmetric Leontief economy equilibrium, that is, an exchange market equilibrium with Leontief utility functions, which is known to be in the complexity class of PPADcomplete. We construct a smooth homotopy interior-point path to tackle this system. We prove that there always exists a continuously differentiable path leading to a complementary solution of the nonlinear system and at the same time to a Leontief economy equilibrium associated with the solution. We also report preliminary computational results to show effectiveness of the path-following Newton method. Dynamic spectrum management (DSM) is a technology to efficiently share the spectrum among the users in a communication system. DSM can be used in the digital subscriber line (DSL) systems to reduce cross-talk interference and improve total system throughput. DSM is also a promising candidate for multiple access in cognitive radio. In DSM, multiple users coexist in a channel, and this causes co-channel interference. The goal of DSM is to manage the power allocations in all the channels to maximize the sum of the data rates of all the users, subject to power constraints. Unfortunately, this problem is non-convex and cannot be solved efficiently in polynomial time. In [6], have shown that dynamic spectrum management (DSM) using the market competitive equilibrium (CE), which sets a price for transmission power on each channel, leads to better system performance in terms of the total data transmission rate (by reducing cross talk), than using the Nash equilibrium (NE). But how to achieve such a CE is an open problem. We show that the CE is the solution of a linear complementarity problem (LCP) and can be computed efficiently. We propose a decentralized tˆatonnement process for adjusting the prices to achieve a CE. We show that under reasonable conditions, any tatonnement process converges to the CE. The conditions are that users of a channel experience the same noise levels and that the cross-talk effects between users are low-rank and weak. Optimization for Sensor Network [7] Zhu, So and Ye, “Universal Rigidity and Edge Sparsification for Sensor Network Localization,” to appear in SIAM J. Optimization, 2012. [8] Alfakih, Taheri, and Ye, “On stress matrices of (d + 1)-lateration frameworks in general position,” to appear in Math Programming, 2012. Owing to their high accuracy and ease of formulation, there has been great interest in applying convex optimization techniques, particularly that of semidefinite programming (SDP) relaxation, to tackle the sensor network localization problem in recent years. However, a drawback of such techniques is that the resulting convex program is often expensive to solve. In order to speed up computation, various edge sparsification heuristics have been proposed, whose aim is to reduce the number of edges in the input graph. Although these heuristics do reduce the size of the convex program and hence making it faster to solve, they are often ad hoc in nature and do not preserve the localization properties of the input. As such, one often has to face a tradeoff between solution accuracy and computational effort. In [7], we propose a novel edge sparsification heuristic that can provably preserve the localization properties of the original input. At the heart of our heuristic is a graph decomposition procedure, which allows us to identify certain sparse generically universally rigid subgraphs of the input graph. Our computational results show that the proposed approach can significantly reduce the computational and memory complexities of SDP–based algorithms for solving the sensor network localization problem. Moreover, it compares favorably with existing speedup approaches, both in terms of accuracy and solution time. Paper [8] is a technical paper that resolved an important mathematical question on the rigidity of a sensor network and structure. Let (G, P) be a bar framework of n vertices in general position in R, for d ≤ n − 1, where G is a (d + 1)-lateration graph. In [8], we presented a constructive proof that (G, P) admits a positive semidefinite stress matrix with rank (n−d−1). We also prove a similar result for a sensor network, where the graph consists of m(≥ d + 1) anchors.
منابع مشابه
An Analysis the Effect of Capital Taxation on Allocation of Resources: A Dynamic Equilibrium Model Approach
Abstract T he return of capital is fundamental to the intertemporal allocation of resources by changing the consumption behavior and capital accumulation over time. Taxation on return of capital increases the marginal product of capital, meaning that capital stock is lower than when capital is not taxed, which results decreased growth and welfare in steady state. This pape...
متن کاملEconomic and Effectiveness Evaluation Analysis of Some Resource Allocation Procedures
Review of theoretical models for cost-effectiveness and of procedures currently being used by different levels of decision makers to evaluate the projects or alternatives is the main objective of this article. To come to some conclusion that which theoretical model would be more effective to be applied for allocating the limited resources among different projects. the performance evaluation pro...
متن کاملOptimizing Radio Resource Allocation in Multimedia DS - CDMA Systems Based on Utility Functions ∗
This paper addresses the utility-based radio resource allocation problem in DS-CDMA systems carrying multimedia traffic. The proposed scheme, aiming at achieving optimal resource allocation, considers the joint power and data rate allocation. To avoid high computational complexity of nonlinear optimization, we reformulate the radio resource allocation problem as a market model, where resource i...
متن کاملAn Intelligent Algorithm for Optimization of Resource Allocation Problem by Considering Human Error in an Emergency Department
Human error is a significant and ever-growing problem in the healthcare sector. In this study, resource allocation problem is considered along with human errors to optimize utilization of resources in an emergency department. The algorithm is composed of simulation, artificial neural network (ANN), design of experiment (DOE) and fuzzy data envelopment analysis (FDEA). It is a multi-response opt...
متن کاملThe Comparison of Direct and Indirect Optimization Techniques in Equilibrium Analysis of Multibody Dynamic Systems
The present paper describes a set of procedures for the solution of nonlinear static-equilibrium problems in the complex multibody mechanical systems. To find the equilibrium position of the system, five optimization techniques are used to minimize the total potential energy of the system. Comparisons are made between these techniques. A computer program is developed to evaluate the equality co...
متن کاملA New Fairness Index and Novel Approach for QoS-Aware Resource Allocation in LTE Networks Based on Utility Functions
Resource allocation techniques have recently appeared as a widely recognized feature in LTE networks. Most of existing approaches in resource allocation focus on maximizing network’s utility functions. The great potential of utility function in improving resource allocation and enhancing fairness and mean opinion score (MOS) indexes has attracted large efforts over the last few years. In this p...
متن کامل